LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data
نویسندگان
چکیده
One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.
منابع مشابه
Extracting transcription factor targets from ChIP-Seq data
ChIP-Seq technology, which combines chromatin immunoprecipitation (ChIP) with massively parallel sequencing, is rapidly replacing ChIP-on-chip for the genome-wide identification of transcription factor binding events. Identifying bound regions from the large number of sequence tags produced by ChIP-Seq is a challenging task. Here, we present GLITR (GLobal Identifier of Target Regions), which ac...
متن کاملFinding Optimal Sets of Enriched Regions in ChIP-Seq Data
The main challenge when analyzing ChIP-Seq data is the identification of DNA-protein binding sites by finding genomic regions that are enriched with sequencing reads. We present a new tool called qips especially suited for processing ChIP-Seq data containing broader enriched regions. Our tool certainly finds all enriched regions that are not exceeded by higher significant alternatives.
متن کاملCMT: A Constrained Multi-Level Thresholding Approach for ChIP-Seq Data Analysis
Genome-wide profiling of DNA-binding proteins using ChIP-Seq has emerged as an alternative to ChIP-chip methods. ChIP-Seq technology offers many advantages over ChIP-chip arrays, including but not limited to less noise, higher resolution, and more coverage. Several algorithms have been developed to take advantage of these abilities and find enriched regions by analyzing ChIP-Seq data. However, ...
متن کاملA Hierarchical Semi-Markov Model for Detecting Enrichment with Application to ChIP-Seq Experiments
Chromatin immunoprecipitation followed by direct sequencing (ChIP-Seq) has revolutionalized the experiments in profiling DNA-protein interactions and chromatin remodeling patterns. However, limited statistical tools are available for modeling and analyzing the ChIP-Seq data thoroughly. We carefully study the data generating mechanism of ChIP-Seq data and propose a new model-based approach for d...
متن کاملA New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions
ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaust...
متن کامل